Everything Totally Explained


Ask & we'll explain, totally!
Simplified molecular input line entry specification
Totally Explained


NEW: Download the Totally
Explained
Alexa Toolbar!

The world's first toolbar is still the best, with safer & smarter surfing and the famous related links


View this entry using RSS



The simplified molecular input line entry specification or SMILES is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules.
   The original SMILES specification was developed by Arthur Weininger and David Weininger in the late 1980s. It has since been modified and extended by others, most notably by Daylight Chemical Information Systems Inc. Other 'linear' notations include the Wiswesser Line Notation (WLN), ROSDAL and SLN (Tripos Inc). Recently, the IUPAC has introduced the InChI as a standard for formula representation. SMILES is generally considered to have the advantage of being slightly more human-readable than InChI; it also has a wide base of software support with extensive theoretical (for example, graph theory) backing.

Canonical SMILES and Isomeric SMILES

The term Canonical SMILES refers to the version of the SMILES specification that includes rules for ensuring that each distinct chemical molecule has a single unique SMILES representation. A common application of Canonical SMILES is for indexing and ensuring uniqueness of molecules in a database.
   The term Isomeric SMILES refers to the version of the SMILES specification that includes extensions to support the specification of isotopes, chirality, and configuration about double bonds. A notable feature of these rules is that they allow rigorous partial specification of chirality.

Graph-based definition

In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree traversal of a chemical graph. The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a spanning tree. Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes. Parentheses are used to indicate points of branching on the tree.

Examples

Atoms are represented by the standard abbreviation of the chemical elements, in square brackets, such as [Au] for gold. The hydroxide anion is [OH-]. Brackets can be omitted for the "organic subset" of B, C, N, O, P, S, F, Cl, Br, and I. All other elements must be enclosed in brackets. If the brackets are omitted, the proper number of implicit hydrogen atoms is assumed; for instance the SMILES for water is simply O and that for ethanol is CCO.
   The double-bonded carbon dioxide is represented as O=C=O and the triple-bonded hydrogen cyanide as C#N.
   Branches are described with parentheses, as in CCC(=O)O for propionic acid and C(F)(F)F for fluoroform, which could also be described by the non-canonical formula FC(F)F. Cyclohexane is represented as C1CCCCC1, the idea being that the two 'number ones' label the same position in the molecule, thus forming a ring with six carbons. Note that the label is the numeral (in this case the 1) rather than the combination of 'C1'. Aromatic C, O, S and N atoms are shown in their lower case 'c', 'o', 's' and 'n' respectively. Bonds in an aromatic cycle are rarely marked explicitly except in SMARTS search patterns. Thus Benzene is c1ccccc1.

Isomeric SMILES

Configuration around double bonds is specified using the characters "/" and "". For example, F/C=C/F is one representation of trans-difluoroethene, in which the Fs are on opposite sides of the double bond, whereas F/C=CF is one possible representation of cis-difluoroethene, in which the Fs are on the same side of the double bond, as shown in the figure.

Extensions

SMARTS is a modification of SMILES that allows, in addition to the SMILES elements, the specification of wildcard atoms and bonds. This is used in specifying search structures and is widely used in chemical database search applications. This practice has led to a common misconception that chemical substructure search is achieved computationally by matching SMILES/SMARTS strings, when, in fact, it's achieved by the computationally more intensive search for subgraph isomorphism in the graphs reconstructed from the SMILES representations.

Conversion

SMILES can be converted back to 2-dimensional representations using Structure Diagram Generation algorithms (Helson, 1999). This conversion isn't always unambiguous. Conversion to 3-dimensional representation is achieved by energy minimization approaches. There are many downloadable and web-based conversion utilities.

External results

Click here for more details on Simplified Molecular Input Line Entry Specification

External Link Exchanges

Do you know how hard it is to get a link from a large encyclopaedia? Well we're different and will prove it. To get a link from us just add the following HTML to your site on a relevant page:

    <a href="http://simplified_molecular_input_line_entry_specification.totallyexplained.com">Simplified molecular input line entry specification Totally Explained</a>

Then simply click through this link from your web page. Our crawlers will verify your link, extract the title of your web page and instantly add a link back to it. If you like you can remove the words Totally Explained and embed the link in article text.
   As long as your link remains in place, we'll keep our link to you right here. Please play fair - our crawlers are watching. Your site must be closely related to this one's topic. Any kind of spamming, dubious practises or removing the link will result in your link from us being dropped and, potentially, your whole site being banned.



© 2007-8 totallyexplained.com | Licensed under the GFDL | Site Map | This article contains text from the Wikipedia article Simplified molecular input line entry specification (History) and is released under the GFDL | RSS Version